Collection Profiling for Collection Fusion in Distributed Information Retrieval Systems

نویسندگان

  • Chengye Lu
  • Yue Xu
  • Shlomo Geva
چکیده

Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalizing scores based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database and do not consider the retrieval performance. In this paper, we address the problem that in peer to peer information systems and argue that the performance of search engine should also be considered. We also proposed a collection profiling strategy which can discover not only collection content but also retrieval performance. Web-based query classification and two collection fusion approaches based on the collection profiling are also introduced in this paper. Our experiments show that our merging strategies are effective in merging results on uncooperative environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed IR for Digital Libraries

This paper examines technology developed to support largescale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic r...

متن کامل

Comparison of different Collection Fusion Models in Distributed Information Retrieval

Distributed information retrieval comes into play when a user wants to get information from di erent sources in parallel. One of the challenges of this topic is the Collection Fusion problem: The distinct result lists of the underlying information retrieval systems (IR) have to be fused to give a global relevance-ranked result list according to the user's information need. In this paper several...

متن کامل

Report on the TREC-5 Experiment: Data Fusion and Collection Fusion

This paper describes and evaluates a retrieval model that considers the problem of data fusion and collection fusion as two faces of the same coin. To establish a clear theoretical foundation for combining various sources of evidence provided either by different search schemes (data fusion) or by distributed information services (collection fusion), we have implemented a retrieval model based o...

متن کامل

Ensuring Retrieval Effectiveness in Distributed Digital Libraries

• collection management; • organizing and indexing the materials for storage We find that dissemination of collection-wide information (CWI) in a distributed collection of documents is needed to and retrieval; achieve retrieval effectiveness comparable to that of a central• user interfaces and human-computer interaction; and ized collection. Complete dissemination is unnecessary. The • interope...

متن کامل

Learning Collection FUsion Strategies for Information Retrieval

In this paper we describe an Information Retrieval problem called collection fusion. The collection fusion problem is to maximize the number of relevant natural language documents retrieved given: a natural language query, multiple collections of documents, and a fixed total number of documents to retrieve. We describe two algorithms that use past queries to learn collection fusion strategies. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007